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In benchmark testsr the goals are quite different.
The designer wants to measure the likely performance
times and errors expected in normal use. The tasks are
not designed to tax the system or the userr but rather to
be representative of the kinds of frequent tasks the
system will normally support. Typically, tasks are
constructed to measure the expected amount of time it
takes a new user to learn a system, the amount of time it
takes the user to perform a set of predefined tasks, and
the amount of time it takes the system to respond to a
user's request. A good study that illustrates the use of
this method is that of the evaluation of eight text
editors by Roberts and Moran (1983) . A study of data-
base interfaces using benchmarks was done by Mantei and
Cattell (1982).

Kinds of Data Collected

There are four major kinds of data collected in tests
of systems: the time it takes to perform a task, the
frequency and kinds of errors, the goals and intentions
of the users, and the attitude of the user.

The amount of time a task takes (either how long an
entire task takes or how long each successive keystroke
takes) reflects the time it takes the user to perceive
inputs, categorize and plan appropriate actions, and
execute proper responses. Error frequencies and types
reflect the difficulties users have with these processes
and often point to the cause of the error (whether the
error response is similar to one in a similar plan, was
generated from confusion with a similar screen, has a
label that sounds the same as another, etc.) A simple
analysis of users' times and errors is found in Reisner
et al. (1975) and Reisner (1977). A comprehensive
analysis of users' times is found in Card et al. (1980b,
1983). Other uses of times and errors can be found in
Boies (1974), Rosson (1984), Sheppard and Kruesi (1981),
and Thomas and Gould (1975).

A more thorough, complicated kind of data to collect
during evaluation involves the user's thinking aloud
while performing the task. Typically the user is video-
and sound-recorded while he or she is performing the
tasks. The recording captures what is said and done,
what is displayed on the screen, what sections of the
documentation are being examined, what parts of the task
instructions the user is reviewing, etc. The most